Shallow Discourse Genre Annotation in CallHome Spanish
نویسندگان
چکیده
The classification of speech genre is not yet an established task in language technologies. However we believe that it is a task that will become fairly important as large amounts of audio (and video) data become widely available. The technological cability to easily transmit and store all human interactions in audio and video could have a radical impact on our social structure. The major open question is how this information can be used in practical and beneficial ways. As a first approach to this question we are looking at issues involving information access to databasesof human-human interactions. Classification by genre is a first step in the process of retrieving a document out of a large collection. In this paper we introduce a local notion of speech activities that are exist side-by-side in conversations that belong to speech-genre: While the genre of CallHome Spanish is personal telephone calls between family members the actual instances of these calls contain activities such as storytelling, advising, interrogation and so forth. We are presenting experimental work on the detection of those activities using a variety of features. We have also observed that a limited number of distinguised activities can be defined that describes most of the activities in this database in a precise way. Proceedings of the Second International Conference On Language Ressources And Evaluation, LREC 2000, Athens, Greece, 31st May-2nd June 2000
منابع مشابه
Clarity: Inferring Discourse Structure from Speech
The goal of the CLARITY project is to explore the use of discourse structure in the understanding of conversational speech. Within project CLARITY we aim to develop automatic classifiers for three levels of discourse structure in Spanish telephone conversations: speech acts, dialogue games, and discourse segments. This paper presents our first results and research plans in three areas: definiti...
متن کاملCunha towards discourse parsing in Spanish
texts can be analysed from different perspectives. one of the most difficult phenomena to process is discourse structure (hovy 2010). in recent years, one of the main challenges in the field of natural language processing (nlp) has been discourse parsing. research on this topic has been done for several languages, such as Japanese (Sumita et al. 1992), english (marcu 2000) and portuguese (pardo...
متن کاملMulti-Layer Discourse Annotation of a Dutch Text Corpus
We have compiled a corpus of 80 Dutch texts from expository and persuasive genres, which we annotated for rhetorical and genre-specific discourse structure, and lexical cohesion with the goal of creating a gold standard for further research. The annotations are based on a segmentation of the text in elementary discourse units that takes into account cues from syntax and punctuation. During the ...
متن کاملExploiting Semantic Information For Manual Anaphoric Annotation In Cast3LB Corpus
This paper presents the discourse annotation followed in Cast3LB, a Spanish corpus annotated with several information sources (morphological, syntactic, semantic and coreferential) at syntactic, semantic and discourse level. 3LB annotation scheme has been developed for three languages (Spanish, Catalan and Basque). Human annotators have used a set of tagging techniques and protocols. Several to...
متن کاملIn Proceedings of ICSLP-98 A DISCOURSE CODING SCHEME FOR CONVERSATIONAL SPANISH
This paper describes a 3-level manual discourse coding scheme that we have devised for manual tagging of the CallHome Spanish (CHS) and CallFriend Spanish (CFS) databases used in the CLARITY project. The goal of CLARITY is to explore the use of discourse structure in understanding conversational speech. The project combines empirical methods for dialogue processing with state-of-the art LVCSR (...
متن کامل